ERIC 



DOCUMENT RESUME 



ED 234 375 CS 007 317 



AUTHOR Johnston, Peter _ 

TITLE Prior Knowledge and Reading Comprehension Test Bias. 

Technical Report No. 289. 
iNSTITUfiON Bolt, Berahek and Newman^ I nc Cambridge ^ Mass.; 

Illinois Univ., Urbana. Center for the Study of 

Reading. __ 
SP0NS AGENCY National Inst, of Education (ED), Washington, DC. 

PUB DATE Sep 83 

CONTRACT 400-76-0116 

NOTE 57p. _ _ , 

PUB TYPE Reports - Rfsearch/Technical (143) — Information 

Analyses (070) 

EDRS_ PRICE MFOl/PCbS Plus Postage. 

DESCRIPTORS Elementary Secondary Education ; Grade 8; ^* 

Learning; *Reading Comprehension; Reading Diagnosis; 

♦Reading Research; *Test Bias ; *Testing Problems; 

Test Interpretation; Test Items; test Reviews; *Test 

Validity 



ABSTRACT 

To show the difficulty of eliminating testbias and 
to develop a methodology for distinguishing between the effects of 
prior knowledge and of skill development bh reading comprehension, 
207 eighth grade students from rural and urban areas were 
administered ah 18-questioh reading comprehension test . Quantitative 
and qualitative effects of prior knowledge on reading comprehension 
were demonstrated through an examination of student performance on 

the test's different types of questions :( ll _ textually 

explicit--drawing on information directly stated in a single sentence 
of text> (2) textually implicit — requiring a synthesis of 
irif ormatiorij, and (3) scriptally implicit—demanding background^ 
knowledge. The study suggests that test scores are biased by prior 
knowledge and reflect the students' 1^6^ more than specific r 
comprehension skills. The findings indicate that test bias can be 
lessened by asking central , rather than peripheral, questions on 
passages for comprehension and by using a content-specific vocabulary 
test to estimate the individual's prior knowledge. (MM) 



* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 

************* ************** 



CENTER FOR THE STUDY OF READING 



Technical Report No. 289 

PRigR_^qWLEDGE ANP_READING 
COI^REHENSION TEST BIAS 

Peter Johnston 
State University of New York at Albany 

September 1983 



University of Illinois 

at Urbana-Champaign 
51 Gerty Drive 
Champaign, Illinois 61820 



Bolt Beranek and Newman Inc. 

50 Mouiton Street 

Cambridge, Massachusetts 02238 



U.S. DEPARtMENT OF EDUCAtlON 
NAtldNAL INStitUtE OF EDUCAtlON 

EDUCATIONAL RESOURCES iNFORMATiON 
^ CENTER (ERIC) 

/S^ This, document has &een reproduced as 
- ' received from the person or organization 

briginatihg it. . 
□ Minor. changes have been made to improve 
rejsroduction quality. 

• Points of view or dpihibns stated In this ddcu* 
meht do hot hecessanly represent official NIE 
position or policy!! 



The research reported herein was supported in part by the National Institute 
of Education under^ Contract No. HEW^NIE-C-4_00'^_76^0116 while the author was 
at the Center for the Study of Reading at the University of Illinois. The 
paper is part of the author's doctoral dissertation and considerable thanks 
are due to (alphabetically) Dick Anderson, Bob Linn^ George McConkiCj David 
Pearson, and Peter Winograd. 



EDITORIAL BOARD 



Wi 1 Ham Nagy 
Ed i tor 

Harry Blanchard 
Wayne 81 i zzard 
Nancy Bryant 
Pat Chrosn iak 
Avon Cr I smore 
Linda Fielding 
Dan Foertsch 
Meg Gal lagher 
Beth Gudbrandsen 



Anne Hay 
Patricia Herman 
Asghar j ran-Nejad 
Margaret 0. Laff 
Brian Nash 
Theresa Rogers 
Terry Turner 
Paul Wilson 



Test Blais 
1 

Abstract 

this paper addresses the problem of the effects of prior knowledge, 
especially those relating to bias, in tests of reading comprehension i 
Quantitative and qualitative effects of prior knowledge on reading 
comprehension were demonstrated through an examination of performance on 
different question types. The availability of the text during question 
answering was also found to influence performance on certain question 
types. Peripheral textual items were most sensitive to such influence, 
central items and scriptal items wiare least isensitive. Performance on 
central questionis actually improved when readers could not rsfer back to 
the text. The biasing effects of prior knowledge were demonsf rated both 
within subjects and between subpopulations (rural and urban) i Bias was 
shown to operate at the level of the individual suggesting that it should 
be removed at that level, not at the population level. This was achieved 
by using a content-specific vocabulary test to estimate prior knowledge. 
This incidentally resulted in a decrease in the bias due to intelligence. 
A conventional approach to bias removal (collapsing across several text 
content areas) also removed the bias due to prior knowledge ^ but at the 
same time it increased the bias due to intelligence. This latter bias was 
also found to be increased when readers were able to refer back to the text 
while answering the questionis. Results are interpreted to isuggest 
modifications of current reading comprehension tests and methods of dealing 
with bias. 



4 



Test Bias 
2 



Prior Knowledge and Reading Comprehension Test Bias 

The basic premise of this paper is that reading comprehension test 
scores are affected by both an individaai's reading comprehension ability 
and his or her prior knowledge. The- main thesis involves a demonstration 
of the consequences of our inabiiity to distinguish betvzeen these two 
sources of test score variance. A second thesis is a description of a 
possible solution to the problem. 

Prtnr Knowledge and Reading Cbmprehensibh 

For many years it has been known that prior knowledge influences what 
is understood from text (e.g., Bartlett, 1932; Reynolds, Taylor, 
Steffensen, Shirey, & Anderson 1981). Several studies have suggested that 
prior knowledge is ah integral part of the comprehending process (Bfansford 
& Johnson^ 1972; Johnston, 1981). This implies that two individuals 
equal in reading comprehension ability but differing in prior knowledge 
would, in all likelihood, exhibit different levels of comprehension of the 
same text. Such differences are thus likely to show up in assessments of 
reading comprehension ability, and there is no way of knowing what part of 
an individual's score is due to reading cbmprehehsioh ability and what to 
prior knowledges Thus attempts to compare several individuals iti terms of 
their reading comprehension abilityj are confounded by the differences in 
their relevant prior knowledge. Findings are then subject to 
misinterpretation. One student may do very poorly because of a lack of 
prior knowledge whereas another student, with perfectly adequate prior 
knowledge i may do poorly because of inadequate reading comprehension 



Teist Bias 
3 

skiiisi It seems important to distinguish between such isources of failure 
since each requires quite different assistance. 

Test bias is any factor other than that being measured which 
systematically influences ah individual's test score. Prior knowledge 
constitutes such a factor. The issue is, what to do about the problem. We 
could try to construct tests which are somehow less dependent on prior 
knowledge. Alternatively, we could try to obtain an indication of that 
part of the comprehension score which varies more closely with reading 
comprehension ability than with prior knowledge, and hence provides a more 
valid index of raw comprehension ability. The present paper is intended 
to: (a) show that the former approaches cannot succeed, (b) provide a 
methodology which may allow us not only to get a less contaminated measure 
of reading comprehension, but also to distinguish between individuals who 
fail to comprehend because of prior knowledge mismatches or because of 
inadequate skill development. 

Current Approaches to Test Bia^ 

Existing approaches to reading comprehension test bias all endeavor to 
devise tests of reading comprehension which are independent of differences 
in individuals' background knowledge. Three approaches have been used^ to 
create such tests: broad topic coverage, passage dependency ^ and latent 
trait models. 

The first approach is evident in the current tests of reading 
comprehension which use a number of relatively brief passages each about a 
different ropic. This strategy is based oh the idea that diverse text 
topics ensure that overall, each child gets a similar spread of familiarity 



S 



Test Bias 
4 

of text. The probable net effect of such a strategy is to ensure that 
readers with stronger general knowledge will be better prepared for the 
test of reading comprehension (just as they would be for an I.Q. test or 
for a vocabulary test). 

The second bias reduction method is to eliminace test items which 
students with extensive prior knowledge could answer before they read the 
passage. Such questions are called passage (or context) independent (Hanna 
& Oaster, 1978-79; Tuinman, 1974). If prior knowledge has extensive 
effects on reading comprehension itself, it is not at all clear that this 
will solve the probiemi 

Latent trait theory and related statistical models represent a third 
potential solution to the problem (e.g., Linn, Levine, Hastings, es Wardrop, 
1980). This group of methods is based on statistical theory rather than on 
a theory of what is causing the bias. Indeed, Tuinman (1979) claims that 
we have reached the functional limit of mathematical and statistical 
models, their increased accuracy not being warranted by the accuracy of the 
actual data. Furthermore, these techniques are based ori population-level 
differences such as skin color; Such population-level approaches seem 
inadequate for several reasons; in the present context, variability 
between populations will virtually always be considerably less than the 
variability between individuals within those populations. In addition, one 
must make a decision as to which of the many populations to choose as 
reference groups (e.g., black/white, male/female, urban/rural) i 

On the larger scale, all of these approaches can be criticized because 
the basic assumption, that it is possible to construct a reading 
comprehension test which will produce a score which is immune to the 



Test Bias 



5 

influence of prior knowledge us erroneous. Since prior knowledge of a 
topic cannot be equated across readers, we would need to construct a test 
which was uninfluenced by prior knowledge. Unfortunately, prior knowledge 
ah integral part of the reading comprehension process (Johnston, 1981; 
Pearson & Johnson^ 1978), Consequently^ if test constructors managed to 
produce a test in which performance was indeed unaffected by prior 
knowledge, whatever it measured, it would not be measuring reading 
comprehension . 

It it is, as claimed, impossible to construct an umbiased test of 
reading comprehension, one simply could concede that the test was biased, 
and obtain a measure of the extent of the bias. The information would be 
used in the interpretation of the test rather than in its construction. 
The challenge would be to find a measure of the bias for a given 
individual. To do this, perhaps we should go to what seems to be the (or 
at least a major) root of the problem, and look at individual differences : 
prior kriovzledge as sources of bias. The question then becomes how to 
estimate an Individual ' s prior knowledge, and hence the probable test bias 
for that individual ? 

E^s^timat-ing^ Pri-or Knowledge 

Studies of prior knowledge have generally used "familiar" versus 
"unfamiliar" texts (e.g., Freebody, 1980) or skin color (e.g., Reynolds, 
et al., 1981) as estimates of prior knowledge. Two other approaches have 
also been used. Hagerup-Neilsen (1977) and Raphael (1981) have had 
subjects rate the familiarity of passages or topics. Unfortunately^ aside 
from the incompairability of different individual's ratings^ this procedure 



8 



Test Bias 
6 



requires metacoghitive aWarerieiss. Pearson, Hansen, and Gordon (1979) took 
a more direct approach. These investigators asked eight prior knowledge 
questions before children read the passages. This seems to be a more 
powerful approach but the questions tend to over-direct reading. 
Furthermore, when the questions are highly related to the text, any related 
improvement could be attributed to greater passage independence of the 
items. Nonetheless, this more direct approach to the measurement of prior 
knowledge was used in the present study with modifications which minimize 
the above problems . 

The major problem with any question construction is definitional. 
Definitions which allow consistent production of other specific item types 
still elude researchers. It is possible that what is required is a 
complete theory of the structure of knowledge so that one could generate 
for any subset of knowledge, appropriate indicators of prior knowledge. 
However, such a theoretical development is presently unavailable. 

A useful set of items should perhaps include some which are very text 
specific, but these would tend to identify those readers for whom the text 
contained little, if any, new information. That is, the items would 
identify those readers for whom reading (that passage) was largely 
recognition (Tuinman^ 1979). Schema theory, however, assumes a more 
widespread influence of prior knowledge. Consequently, these items alone 
would be inadequate. Rather, items would need to be symptomatic of relevant 
underlying schematic knowledge. For example, knowledge of the meanings of 
certain relatively low frequency words might be diagnostic if the frequency 
of use was somewhat higher amongst experts in the knowledge domain. 



9 



Test Bias 
7 

However, items which merely discriminate experts from nonexperts would not 
be sufficient i A most useful set of items, from a purely functional 
standpoint, would form a Guttman scale which would differentiate various 
levels of expertise. This outcorae probably would require successively less 
specitic items in order to distinguish the experts from the diletantes, and 
these from the novices, and so on. In Anderson and Freebody's (1979) 
terms, we need a spread of items tc assess the "depth" rather than the 
"breadth" of relevant vocabulary. Currently we must take a pragmatic 
approach to the selection of these items, tempered by such theory as 
exists. Consequently, in the present study, prior knowledge was measured 
by testing specific, content-related vocabulary knowledge. 

There is, however, a problem with using a vocabulary measure as an 
estimate of prior knowledge i It would not be difficult to build an 
argument that vocabulary questions merely estimate general ability (I.Q.) 
since intelligence tests contain vocabulary subtests. Such t'jsts (and 
subtests) are highly predictive of performance oh tests of reading 
comprehension. For example, invariably, factor analytic studies of reading 
comprehension have found a word knowledge factor on which vocabulary tests 
load highly (e.g., Davis, 1944, 1968; Spearitt, 1972). In studies of 
readability too, any index of vocabulary difficulty accounts for about 80 
per^.eht of the predicted variance (Coleman, 1971). 

Anderson and Freebody (1979) have examined the three competing 
hypotheses which attempt to explain this finding: the instrumentalist, the 
aptitude, and the background knowledge hypotheses. The instrumen^ialist 
position is that knowing words allows taxt comprehension and not knowing 
them means that one cannot proceed adequately through the text. The 



10 



Test Bias 
8 

aptitude hypothesis considers vocabulary kriowliedge as just another index of 
liQ. which is the real factor accounting for comprehension. The background 
knowledge hypothesis suggests that vocabulary knowledge is a distal index 
of background conceptual frameworks (schemata) necessary to understand 
passages about a particular topici 

Although these hypotheses are not mutually exlcusive, the study 
presented in this paper will test the prior knowledge and general ability 
hypothesis. That the vocabulary measure estimates prior knowledge and not 
merely l.Q. will be ensured by a within-subjects design. That is, an 
individual's l.Q. is relatively stable, thus variability in performance 
over a two hour period cannot readily be attributed to changes in general 
ability. 

Prior Knowledge ^nd Xjuestion Type 

The outcome measures Erom reading comprehension tests generally 
provide a quantitative measure of "how much the reader has comprehended." 
There are, however, possible qualitative differences between readers i For 
example, the total score may be the same for two different readers, but if 
one succeeded on all literal items and on none of the inferential items, 
while the other performed equally well oh each type^ presumably there is a 
qualitative difference in their comprehension of the text. 

Perhaps prior knowledge differentially influences performance on 
different question types (Pearson, Hansen, & Gordon, 1979). But what 
constitutes a different type of question? Pearson and Johnson (1978) and 
Lucas and McConkie (1980) have developed systems which make the same basic 
distinctions among questions. These distinctions are exemplified in 



11 



Test Bias 
9 

Pearson and Johnson's system which is really a classification of question- 
answer relationships. The distinctions relate to the location of the 
information required to and/or actually used to answer the question. 
Textually Explicit (TE) items have both the question information and the 
answer information stated in a single sentence in the text. Textually 
Implicit (TI) items have the question information and response information 
stated in different sentences in the tey':, requiring the reader to combine 
the separate pieces of information in order to produce or recognize an 
answer. In order to answer Scriptaliy implicit (SI) questions, the reader 
must combine some information from the text and some from background 
knowledge (script). Based on the analysis of what is involved in ansv/ering 
the different question types, it seems likely that the SI questions/answers 
will be more influenced by prior knowledge than will other question types 
Indeed, Pearson et al. demonstrated this to be so. However, perhaps 
answering the questions with the text available for reference (as in 
standardized reading comprehension tests) would produce a different result. 
For example^ since textually implicit questions would then have the reader 
dependent on memory for neither piece of information, their outcome . should 
become less influenced by prior knowledge. 

Of course, prior knowledge may affect other qualitative aspects of the 
outcome. For example, the reader's performance on more or less central 
questions may differ depending on his priot knowledge and the extent to 
which long-term memory is involved in the ^r^ski Conceptual dependency 
theory (Schank, 1975) holds that knowledge is stored with respect to 
central causal chains of underlying conceptualizations. When readers are 



is 



Test Bias 
10 



dependent upon their memories for irifbrmatibn to answer questions^ they are 
likely to be able to respond more successfully to central items ^ since 
central irif ormatiori is more likely to be stored than is peripheral 
informatidn. However^ this may riot be the case wheri lorig-term memory is 
dtily minimally involved iri the task, as wheri the reader can refer to the 
text while answering questions. 

Question classification in tests currently is based around a simple 
literal versus inferential distinctions Pearson and Johnson's (1978) 
descriptors represent a more refined version of this approach, yet there is 
good reason to believe that the "centrality" of the information is also 
very important. Omanson's (1982) work with the narrative analysis is 
particularly noteworthy in this regard. It is of considerable theoretical 
interest to see V7hich set of variables is more irapdrtarit under different 
task conditions. Pearson and Johrisori's descriptors represerit the presumed 
irif ormatiori source, whereas ceritrality represerits mote the nature of the 
irif ormatiori arid how it relates to prior kridwledge. Orice the text has beeri 
read arid the reader is answering questions from memory, the information 
source should become less meaningful, since it all must come from the 
reader's headi However, because of the nature of the storage process, the 
structural importance of the information is more likely to determine the 
ability to respond to questions. On the other hand, when text is readily 
available for referral during question answering (as in standardized 
tests), it seems likely that location of information (within the text or in 
the reader's head) should be a much stronger determiriant of the reader's 
responses than the relative ceritrality of the information. Search 
strategies may be more critical, arid storage should rib longer be a prbblem. 



Test Bias 
11 

Cdnsequently , the present study used questions based both on Pearson 
and Johnson's (1978) taxonomy and the centrality notion, to examine 
possible differential biasing effects of prior knowledge on different types 
of questions. Similarly, comprehension questions were presented both with 
and without the text available to refer back toi 

It was hypothesized that prior knowledge would account for a 
significant portion of reading comprehension variance within subjects, thus 
representing an important biasing factor. It was anticipated that the 
biasing effects would not be accounted for on the basis of the passage 
dependency of the questions and neither would the problem be removed by 
increasing the spread of text topics. Instead, increasing the spread of 
text topics was expected to increase the correlation between total reading 
comprehension score and I.Q. However, it was predicted that bias would be 
removable by estimating prior knowledge with a content-specific vocabulary 
test and producing residual comprehension scores i 

The effects of prior knowledge were also hypothesized to differ across 
question types depending on whether or not the text was available to refer 
back to while answering the questions. 

METHODOLOGY 
The Materials and Tasks 

Reading Comprehension 

Reading comprehension was assessed by having the students read and 
answer 18 questions about each of three 650-750 word texts. The content 
areas of the texts were: 



14 



Test Bias 



12 



(1) The specialization of corn in the U.S. 

(2) The financial problems of the Chicago Regional Transit Authority 



(3) The battle of Antietara Creeki 
The first two topics were chosen for their likely bias toward rural and 
city children, and the third for its presumed lack of bias (since the Ciyil 
War is part of both groups* curricula). The Fry readability scores of 
these texts were seventh grade (Civil War) and eighth grade (corn and RTA) . 
The texts were basically taken from a textbook (Civil War), an agriculture 
handbook (corn), and two newspaper articles (RTA). 

The 18 questions were constructed for each text with 6 of each type of 
question in Pearson and Johnson's (1978) taxonomy: textually explicit, ^ 



textually implicit, and scriptally implicit. In addition, half of the 
items for each question type tested information which was central to an 
understanding of the text and half tested peripheral information. These 
divisions were accomplished by having ten adult subjects rate on a 1-4 
scale the centrality of a list of propositions derived from the passages i 
Propositions were considered to be central if the mean rating was three or 
higher, and peripheral if two or lower. This criterion generally meant 
that there was at least 80% agreement among the adults in V7hether the item 
was given one of the top two or bottom two ratings. The selected 
propositions were then turned into multiple-choice questions by generating 
alternatives such that two of the distractors maintained some of the 
surface characteristics of the text i Each set of questions thus contained 
three of each of the six question/answer types generated by the Pearson and 
Johnson classification system and high versus low centrality. 



(RTA) . 



Test Bias 



13 



A problem occurred which related to the nature of the Pearson-Johnsoti 
taxorioray* Unless textually explicit or implicit question^ arid answers are 
verbatim from the text, they involve varying amounts of scriptal knowledge. 
That is, as soon as a synonym is substituted, scriptal knowledge becomes 
mildly implicated in the relationship. In the present study, synonym 
substitution or paraphrase was allowable withiri textual items. Scriptal 
items required an extra piece of information which was not mentioned in the 
text • 

Prior Knowledge—Vocabulary Tests 

The extent of an individual's prior knowledge relevant to each of the 
content areas used in the reading comprehension test passages was assessed 
by means of content-specific vocabulary questions. Each of the three 
content areas was addressed with ii mai tiple-choice questions, each 
presenting a word and four possible definitions^ or a definition and four: 
possible words. The 33 items were placed in a single test format, with the 
content areas alternating so that every third question addressed the same 
content area. The resulting general vocabulary test contained three 
content-related subtests. The vocabulary which was assessed by the 
questions was selected so that some items were very specific to the content 
area, whereas other items were somewhat less specific. This was done in an 
effort to distinguish varying degrees of "expertness . " In the present 
study, this specificity was done at an intuitive level. 

When the vocabulary test was administered, the students were simply 
told that the test was a vocabulary test and that they were to work through 



ie 



Teist Bias 

it at their own pacei They were also told how to answer the questions 
(circle the correct alternative) and to be sare to answer all questions. 

Intelligence Test 

As a measure of intelligence, the students were given the IPAT Culture 
Fair Intelligence Test scale 2 (Institute for Personality and Ability 
Testing), a nonverbal reasoning test involving four subtests and taking 
about 20 minutes to administer. 

Subjects 

A total of 207 eighth-grade students from two quite distinct 
subpoDUlations participated in the study: Three small rural schools in 
southern Illinois (^ = iOl), and two parochial schools in Chicago = 
106). The mean liQ. on the IPAT culture-fair was 1G3 (-SS = 14.5) with 
sabpopuiation means of lOiiOi ( Sh = 13.94) for rural students, and 104.83 
( SD = 14.89) for urban students. 

Procedure 

In order to ensure that ability was equally spread across the groups, 
scores on standardized reading comprehension tests were obtained several 
days prior to the study and were used to rank order students before 
assigning them to groups, thus producing stratified random samples. 

There were four between-subject experimental conditions. Three of 
these were based upon the extent to which subjects were dependent upon 
long-term memory to answer the questions i Group One (N = 45) was least 
dependent on long-term memory since it had the text available to refer to 
while answering the questions. Group Two (N^ = 47) was not allowed to look 



17 



Test Bias 
15 

back at the text while answiBring thiB questions^ but proceeded to ahsWer the 
questions as soon as the passage wais read. The third group (N ^ 49) was 
not only unable to refer back to the text while answering the questions ^ 
but had a five-minute task interposed between reading a text and answering 
the questions. The tasks used were subtests of Che IPAT non-verbal which 
the other groups took in one sitting. 

The fourth group = 50) was a control group. These students were 
required to answer the questions without the benefit of having read the 
text. Such a group was necessary in order to' demonstrate that the effects 
of prior knov/ledge were not simply on question answering, but on reading. 
In each schbcl^ Group Three was tested separately from Groups One^ Two, and 
Four since only they required systematic interruption of their reading and 
question answering. 

Each student was given ah envelope cdritaihing the necessary materials. 
All took the vocabulary test first. Groups One, Two, and Four then took the 
IFAT non-verbal liQ," test followed by their comprehension tests. The third 
group received their texts in a different manner. They were given the 
text, then a section from the iPAT, followed by the questions. This 
jjattern was then repeated for each of the other two text topics. 

Results and Discussion 
All major analyses involved split plot hierarchical multipilo 
regressions (Cohen & Cohen, 1975). Since the within-subjects measure of 
prior knowledge was not independent of the between-subjects measures, the 
individual's mean score on the dependent variable was entered ais the first 
independent variable in the within-subjects analysiis. This procedure has 



Test Bias 
16 

the effect of removing all between-subject variance and leaving only 
withiri-subject variance (Erlebacher^ 1977). 

All students read the passages in the same order, and the paissages 
were clearly not of equal difficulty. These effects were removed by 
entering "passage" (as two orthogonal contrasts) second in the within- 
subjects analysis i Since there was no reason to hypothesize equal (or 
unequal) difficulty of the passages, these usually significant contrasts 
were not interpreted^ 

If a subject skipped a page of questions, then those data were labeled 
missingi However, an omission of one or two questions in sequence resulted 
in the items being marked incorrect. Only subjects with complete data were 
used in the analyses. 

The Exp^erimeitt^aL Tasks 

Reading Comprehension 

A problem arose with the comprehension task; While the readability of 
the texts was rated at the seventh and eighth grade difficulty by the Fry 
formula, the students' comprehension scores indicated that the task was 
very difficult; Of coarse, rather than the texts, the problem may have 
been more in the questions. Indeed, for about five of the questions oh 
each text, the students' mean response was at or below chance level. The 
effect of this "flooring" was to produce a restriction of range. 
Nonetheless, rather than tamper with the data by discarding these items, it 
was decided to analyze the intact data. Tljfe f indings must be interpreted 
in the light of this range restriction, arid^ the question of possible 
underestimation of effect sizes must be considered. 



iG 



Test Bias 



17 



The Prtnr JC] 




:^ Tests 



This set of tests functioned well, having a fail range of scores ( I'- 
ll) on two tests, a range of 2-11 on the third, and means of 8il (corn), 
6.4 (RTA), and 6.7 (Civil War). Standard deviations were 1.9, 2.0, and 
2.2, respectively. 

Prior Knowledge and Reading Comprehension 

A major focus of this study was an irivestigatibn of the effects of 
prior knowledge upon reading comprehension. Three different observations 
were taken on each variable for each subject, one for each knowledge 
domain. This means tnat if prior knowledge differences influence reading 
comprehension for a given individual, then it is difficult to argue that 
the effects were due to some other factor such as verbal iiQ«, which would 
be constant for that individuals 

Because the within-subjects design really does allow the "all else 
being equal" assumption in interpretation, one should not expect as much 
variability within subjects as exists between them. However ^ one can 
expect effects which are less contaminated by extraneous variables. 
Furthermore i the number of observations involved in the within-subject side 
of this study is three times that for the betweeri-subjects side. Since the 
analysis is consequently less likely to "overfit" the data (that is, 
repeated samples are likely to yield very siuiilar findings), the variables 
tend to explain less dramatic but more reliable propor of variances 




Test Bias 
18 



The Findings 
Reading Comp r ehens ion and Test Bias 

The first ma^or finding of the study was that prior knowledge 
accbuuted for 3.5% of the within subject variance^ F^(1^282) = 11.72, p < 
.001, Table 1. This result indicates that prior knowledge influences the 

Insert Table 1 about here. 

comprehension of texts independent of the effects of intelligence and other 
between-subject confounding variables. The evidence cannot be argued on 
the grounds of contrived materials or other validity grounds since it has 
been replicated with a selection of very ordinary texts, and using 
multiple-choice questions. The potential of prior knowledge as a biasing 
factor is evident. 

The study also offers insight into the practical implications of this 
biasing effect for the assessment of reading comprehension. Between- 
subject variability shed most light on this issue, t^ile the proportions 
of variance explained are inflated by reduced degrees of freedom (though 
still substantial) and a greater possibility of correlated nuisance 
variables, between-subject variability reflects the assessment situation 
more accurately. The proportions of between subject variance accounted for 
by prior knowledge, with general ability held constant, are shown in Table 
2. The effect is consistent across texts. 

Insert Table 2 about here. 



21 



Test Bias 
19 

The effect was not simjply due to readers' ability to answer the 
questions regardless of having read the text. This possibility was 
investigated through a regression analysis of the scores of students who 
answered the questions without having read the text. The proportions of 
between subject variance explained by prior knowledge for each passage were 
2% (corn), 1% (city), and 4% (Civil War), none of which was significant (£ 
= 50). Consequently, attempts to remove bias by simply discarding the less 
text-dependent items seem unlikely to succeed. 

To see whether the texts were in fact biased towards one or another 
subpopulat ion^ reading comprehension scores were regressed on prior 
knowledge and the subpopulatiori of which the reader was a member (in both 
orders). Table 3 shows that the texts used were each biased towards either 
the rural or the urban children (population entered first). The "corn" 
passage was biased toward rural students, and the "city" passage was biased 
towards urban chiidreni These biases had been predicted a priori, but the 
"Civil War" passage (presumed to be neutral) was also biased towards 
rural students. Possibly the country children's curriculum covered more 
(or more relevant) Civil War material. 

Insert Table 3 about here. 

While this demonstrates that population level bias exists, bias is not 
a population level phenomenon but an individual one. Two findings support 
this claim. First, there was a trend towards a sex bias in the "Civil War" 
passage. Boys tended to know more about war things arid to read about them 
with greater comprehensioni IJhiie not statistically significant, ^(1,139) 
= 3.71, Jt required for significance at .05 level = 3.91; this trend 

o 

ERIC 



Test Bias 
20 

illustrates the fact that when bias is defined at the population level, 
there are potentially as many biases as we can describe subpopulations • 
Second, when prior knowledge is entered into the regression before 
subpopulation, the latter has virtually no remaining predictive power. 
Thus, removal of the population level bias can be accomplished by removing 
the individual level bias, but the reverse generally is not true. 

There are two ways to examine systematic effects, and each is 
represented by one of the above definitions. An empirical demonstration of 
group differences represents the current definition. However, there are as 
many such potential biases as there are conceivable subpopulations. Most 
group biases normally go unnoticed simply because we lack the population 
descriptors and motivation to test for them. It is because of this that we 
cannot simply try to statistically identify biased items and then eliminate 
them from the test post hoc. How many subpopulation descriptors should we 
use? Just the politically expedient ones? 

The second way to examine systematic effects is through theory . If we 
have a theory of the source of biases, we can look at bias at the 
individual level. The proposed definition recognizes that a test can be 
biased against an individual within a population; identification of such 
bias need no longer be dependent on differences between arbitrarily 
selected subpopulations. Theory offers us a solution to the problem of 
test bias. The solution involves adopting an approach not unlike that 
commonly taken over the I. Q. /reading comprehension relationship. That is, 
initially it has been accepted that reasoning is an integral part of 
reading (Johnston^ 1983; Thorndike, 1917; Tuinman, 1979); thus nobody 



Test Bias 
21 

tries to construct reasoning-free reading comprehension tests. Instead^ 
they are satisfied examining reading comprehension in the context of a 
measure of reasoning ability such as a WISC score. Perhaps the same should 
be done with a measure of prior knowledge. This study shows that having 
measured relevant prior knowledge its effects can be removed statistically 
from tests of reading comprehension when required. Removing the effects of 
prior knowledge provides us with a residual reading comprehension score 
which is free from bias. 

There are several criticisms which might be leveled at this approach. 
It might be protested that the prior knowledge bias can be eliminated more 
easily by using a variety of text topics to produce an aggregate score, as 
is done in current tests. Table 4 shows that indeed this is the case. 
However, the figures also indicate that there is an unfortunate side effect 
of such a procedure. The proportion of variance related to I.Q. becomes 
much greater. That is, an I.Q. bias has been introduced. On the other 
hand, the population difference also disappears when the bias is removed 
statistically from each passage score before aggregating the "debiased" 
scores. But there is also a beneficial side effect. The extent to which 
I.Q. explains performance is also reduced considerably, from 14,6% of the 
variance to 4,1%. This reduction is significant at the .OOt level using a 
deperiderit sample t test for differences between variances, ^(138) = 20.43. 
Furthermore, the table illustrates what may be measured by reading 
comprehension tests i Removing the influence of prior knowledge leaves a 
variance of 6.15 instead of 35.4, 17% of the original variance. In other 
words, '17% of the variance in the measure is due to factors which are 
independent of prior knowledge. 



Test Bias 



22 



Insert Table 4 about here. 

Critics may well question the reliability of residual scores. 
Substantial norming populations and Well designed tests may reduce this 
problem somewhat. However^ it must be born in mind that current methods 
are no better. Any greater reliability of bur scores oh conventional test 
scores is not diie to their reliably measuring reading comprehension, 
because a good part of the raw score is a result of differences in 
intelligence and other factors. Thus, the greater raw-score reliabiity is 
at the expense of validity. 

Critics mi^ht wonder about the context in which a residual score might 
be useful. In order to address this issue it is important to make a 
distinction between the use of the prior knowledge measure at the 
individual level and at the group level. The residualized score is most 
useful at the group level where one is interested in knowing how able one 
or more groups of readers are at comprehending from text given their levels 
of relevant prior knowledge. Interestingly, when the debiased scores for 
individual passages are summed into a total score, the net score is not 
otily free from prior knowledge bias, but also relatively free from general 
reasoning bias (Table 4) « Both effects are because we have removed the 
cause (rather than just the symptom) of the biases from the test. With 
the cause gone, the symptoms g& tooi 

Alternatively, the effects need not be removed^ instead, performance c 
the comprehension test might simply be considered in light of a measure of 
the reader's prior knowledge. In this case, with appropriate norms, a 



Test Biais 
23 



reader's performance might be considered separately on familiar and 
unfamiliar material. Since there are different strategies involved in 
reading familiar and unfamiliar texts (also depending ori the reader's 
goal), such evaluation may yet provide valuable diagnostic informations At 
the individual level, the residual score is still meaningful in that it 
describes the individual's reading comprehension performance relative to 
that which would be expected given his or her level of prior knowledge. 
However, when working with individuals, it would be best to have all three 
scores available for interpretation: the raw reading comprehension score, 
the prior knowledge score, and the residualized reading comprehension 
score. 

Other types of reading problems ultimately may also be detected using 
this approach. One such diagnosable reading difficulty inay be that 
described by Spiro (1980) as a "schema selection" problem. This is the 
problem caused by failure to use relevant prior knowledge when it would be 
appropriate to do so and the reader has it available. Of course, problems 
caused by "schema unavailability" would also be readily detected, that is, 
failures caused simply by the reader not having the appropriate relevant 
knowledge base before reading. While these proposals remain, for the 
moment, untested, the promise is great, and they are an important area to 
be developed in future research. The first step towards this must be the 
refinement of the measure of prior knowledge. 

Reading Vocabulary and Reasoning 

Anderson and Freebody (1979) have described three hypotheses to 
explain why vocabulary tests account for so much of the variance in reading 



Test Bias 



24 

comprehension tests. The first of these hypotheses is the "general ability 
hypothesis." This hypothesis proposes that the relationship is simply that 
vocabulary tests estimate general ability and brighter students will be 
better readers. This study provides evidence against this 
hypothesis. First, the within-subject analysis involving the prior 
knowledge vocabulary test shows the effect of prior knowledge on 
comprehension (Table 1). Since it does not seem reasonable to assume that 
an individual's general ability varies from moment to moment, these effects 
do not support the general ability hypothesise 

Second, in the between-subject analyses (Table 2), the variance 
associated with reasoning ability (as measured by the IPAT) was *covaried 
but before the prior knowledge vocabulary scores entered the regression 
equation. The prior knowledge test still accounted for a substantial 
portion of reading comprehension variance. Thus ^ at least some of the 
relationship between vocabulary and reading comprehension is not simply 
because both relate to general ability. 

I-Jhile these findings argue against the general ability hypothesis, 
they support the "prior knowledge hypothesis" which asserts that the 
connection between vocabulary and reading coraprehension tests is prior 
knowledge. That is, knowing the words in the vocabulary test is indicative 
of underlying schemata. At least this is so in the single text situation. 
Standardized tests, however, use more than one text. 

Contempbray reading comprehension tests contain a number of texts each 
bn a different topic. Vocabulary tests also contain items from a broad 
range of domains. Combining the content-specific vocabulary tests into a 
single nonspecific vocabulary test would reflect this situation and at the 



27 



Test Bias 



25 



same time produce a -longer more reliable test. If general verbal ability 
is the source of the relationship between current vocabulary tests and 
reading comprehension tests ^ a more general vocabulary test should 
correlate more highly with comprehension of a given passage. To test this 
hypothesis j three vocabulary scores were constructed for each passage as 
follows : 

(1) the sum of the 11 content-specific items (specific vocabulary) 

(2) the Slim of the remaining 22 items (gt-neral vocabulary [2] ) 

(3) the sum of all 33 items (general vocabulary [ 3 ]) . 



The mean correlations between these three scores, and reading 

comprehension (Table 5) suggest that the more vocabulary tests are 
aggregated across content, the more they correlate with I.Q, and the less 
with reading comprehension^ though the trend is not statistically 
signif icant • 

It could be argued that this relationship with I.Q. is simply because 
of increased reliability as a result of more test items being aggregated. 
To counter this argument , two similar general vocabulary tests were 
constructed, each containing a random sample of items with the restriction 
of equal numbers from each content area instead of ail items from each 
specific test. This provided three tests of differing generality but with 
equal numbers of items . Table 5 shows in parentheses the correlations 
between these tests , reading comprehension, and I .Q . The f igures suggest 
that the increased correlation with I.Q. is due more to increased diversity 



insert Table 5 about herei 




Test Bias 



26 



of content than to increased reliability. Vocabulary tests with equal 
numbers of items biit increasing generality were still increasingly 
correlated with I.Q, 

Further support wais gained for this hypothesis, by entering the 
general vocabulary [2] scores into the regression before the specific 
vocabulary. If the prior knowledge hypothesis is correct, the specific 
vocabulary test should still account for a significant proportion of variance 
in reading comprehension, even after the statistical removal of the effects 
of the general vocabulary [2 ] test. This was indeed the casei The 22 item 
general test accounts for an average of 3.9% of the reading comprehension 
variance whereas the specific test accounts for an average of 9% of the 
variance (Table 6). This finding is in spite of the fact that the general 
test has twice as many items, covers a broader span of knowledge, and 
enters the regression first. 



Table 4 presents a different perspective on the problem. When the 
effects of prior knowledge are removed from each passage, and the 
individual's total residual score is computed, I.Q. accounts for a very 
much smaller portion of the variance than it does when the raw (biased) 
scores are aggregated. It is still signif leant ^ as one would expect 
(Johnston, 1983; Thorndike^ 1917; Tuinman, 1979), but explains a smaller 
proportion of the variance. 

From these arguments, it can be seen that while the prior knowledge 
hypothesis is supported for specific vocabulary and comprehension of 
specific texts, the standardized tests provide a situation best described 



Insert Table 6 about here. 




Test Bias 



27 



by the "general ability" hypothesis. Aggregating performance on vocabulary 
or reading comprehension tests across content areas tends to increase the 
correlations between those tests and tests of I»Q» because both are biased 
towards greater general knowledge. 

A further source of relationship between standardized reading 
comprehension tests and I.Q. was also explored. It was suggested in the 
first section of this paper that part of the correlation between I.Q. and 
reading comprehension in standardized tests may stem from the fact that the 
text is faity available for the reader to refer to for answers i Such tests 
require search and match strategies, this hypothesis was tes table i 

Indeed, the hypothesis did gain some support from the correlations 
between I.Q., comprehension; and prior knowledge when the task depends 
increasingly on long-term memory. When the text is available the 
correlation between I.Q. and comprehension is higher (jo^ = .31) than when the 
text is not available but questions are immediate (r = .27) which is^ in 
turn, higher than the correlations when the text is unavailable and the 
questions are delayed (r^^ = .19); The reverse trend is evident for the 
correlations between comprehension and prior knowledge; When the text is 
available the correlation between prior knowledge and reading comprehension 
is lower (:r = .23) than when the text is not available but questions are 
immediate (t = i24) which is lower than when the questions are delayed 
(r^ = i33)i While these correlations are not significantly different from one 
another, they consistently proceed in opposite directions as predicted. 
The probability of these two trends occurring by chance is ,063. Because 
the two trends procede in opposite directions, it is difficult to argue 




Test Bias 
28 

that the reduced correlation with I.Q. might be due to reduced variance or 
some other alternative, thus, the data suggest that standardized reading 
comprehension tests are biased towards readers with greater general 
ability. 

Question Type and Long Term Memory Demands, 

The effects of prior knowledge on reading comprehension when the test 
tasks made readers more or less dependent on information storage and 
retrieval were examined using three groups of subjects. 

Group One subjects had minimal dependence on memory since they had full 
access to the text while answering the questions. Group Two was denied 
such access to the text but answered the questions as soon as they had read 
the text. The third group was denied text access during question answering 
and had an interfering task between text and questions i 

The contrast between groun one and the other two groups was 
significant, F^(l,282) = 7.67, p^ < iOl. The means for the three groups were 
7.4, 6.3, and 6.2, respectively (standard deviations 3.0, 3.0, 2.8). The 
contrast between the latter two groups was not significant, possibly 
because of floor effects, and possibly because the approximately five 
minute filled delay was not long enough to induce further changes in 
performance. However, the major interest in this variable was in its 
relative effects on different question types. 

For the analysis of the effects of different question types, each 
subject's comprehension score was broken down, within each topic, into six 
subscores, representing the three question types by two levels of 
importance; importance was dichotomoasly coded and question type was 



Test Bias 
29 

entered into the regression as two orthogonal contrasts: Ql representing 
the contrast between textual items (the mean of the taxtually explicit and 
textually implicit items) and scriptally implicit items; Q2 representing 
the contras t between textually implicit and textually explicit items • The 
results of this analysis are shown in Table 7. 

Insert Table 7 about here. 

Each subscore contained only three multiple choice items. 
Consequently, the subscoresi have a high error component and a very small 
variance which was restricted further by the generally low performance. 
These constrictions are reflected in the proportions of variance explained. 
The proportions should be given less credence than the F values. 

Both question type contrasts were significant, reflecting the fact 
that textually explicit questions (mean = 45%) were easier than the 
textually implicit questions (mean = 37%) which were easier than the 
scriptally implicit questions (mean = 29%). As a main effect, centrality 
of the piece of information being assessed was not a significant predictor 
of performance. However, both centrality and question type are involved in 
significant interactions with other variables. 

Two series of interactions were significant. The first of these 
involved prior knowledge, centrality, and the availability of the text 
while answering questions. When the text is available to refer to, 
answering peripheral questions is easy (Figure i). When the text is 
not available to refer to, the same task becomes very dilE'icult. It seems 
that peripheral information is easily obtained from searches of the text 
but less readily stored. 



EKLC 



32 



Test Bias 



30 



Insert Figure 1 about here. 

On the other hand, central questions posed an easier task when the 
text was not available for reference than when it was available. Schema 
theory would predict that there should be minimal deterioration in 
performance on central questions when memory must be relied upon more, 
since the reader presumably constructs a central chain in the process of 
comprehending. The fact that performance actually gets better may be 
because of a preoccupation, on the part of the reader^ with the textual 
features. That is, when the text is available j a reader may use search 
strategies rather than comprehension strategies. Text based distractbrs 
raay then prove to be more attractive, since the search would also turn up 
bits of information found in the distractors. This interpretation is 
supported by the results of a study by Nicholson, Pearson, and Dykstra 
(1979) who found that when readers were allowed access to the text (which 
contained embedded errors) while answering questions, they were less 
accurate in their answers than if they did not have access to the toxt . In 
the present study it is also noticeable that the improvement on central 
questions is greater for students with greater prior knowledge (Figure 1). 
This might also be expected if readers were indeed able to more 
successfully store the central chain of information than the peripheral 
details • 

In addition. Figure i shows an interactrion between prior knowledge and 
the centrality of the questions. It indicates that when readers are 
reading more familiar material they are more able to answer questions about 



Test Bias 
31 

the text, and this advantage is greatest for more peripheral questions. 
This can be interpreted in terms of a model which suggests that when 
readers have greater prior knowledge ^ they have more highly developed 
schematic structures ^ with more accessible "slots" for storing related 
information • Thus , while the performance oh central items does improve ^ 
the improvement is more marked on the peripheral questions. In the same 
way, when readers have little prior knowledge, the biggest decrement in 
performance when memory is called upon is on the peripheral items. Readers 
generally answer central questions better when the text is unavailable than 
when it is available, and this trend is more pronounced when readers have 
greater topic-relevant knowledge i 

The second series of interactions includes those involving centraiity, 
text availability, and question type (the contrast between scrip tally 
cimplicit items and the mean of the two textual items). The contrast 
between question type and text availability (Figure 2) indicates that while 
textual questions are easier than scriptal questions ^ when the reader does 
not have access to the text the drop in performance on textual questions is 
extremes The fact that this falloff in performance is not as severe for 
the scriptal items is Probably at l^jast partly due to an obvious floor 
effecti 

Insert Figure 2 about here. 

While central questions are more difficult than peripheral ones, if 
they are scripcal as well as central, they are even more difficult. Again, 
the scriptal questions show an improvement when readers do not have access 
to the text, possibly reflecting their reluctance, when the text is 



Test Bias 
32 

present, to use their prior knowledge. This may also reflect increased 
attractiveness of the text-based distractors through the readers' greater 
belief in the text than in their ovm understanding. Again, this supports 
the findings of the Nicholson, Pearson, and Dykstra (1979) study noted above. 

The most interesting aspect of this interaction involves the 
difference between central and peripheral text-based questions across tasics 
differing in long-term memory demands. The readers' performance on more 
central textual questions is relatively unaffected when the text is 
unavailable to refer back to, whereas their performance on peripheral 
textual questions shows a precipitous drop. This is exactly as could be 
predicted from the type of general model proposed by Schank (1975) and 
Kintsch and van Dijk (1978) and that proposed by Omanson (1982) for 
narrative text. 

Summary and Conclusions 
This study provides evidence that prior knowledge Influences the 
comprehension of texts and that the effect is not because of contrived 
materials, or other validity problems. Neither is it simply because of 
improved ability to guess the questions without first reading the text. 
This means that prior knowledge can be responsible for biasing the 
information gained from reading comprehension tests. The study also raises 
the question of what standardized reading comprehension tests measure. The 
answer, as indicated by this study, is that thsy provide a fairly good 
proxy for I.Q., just as do standardized vocabulary tests; A high score on 
such a reading comprehension test indicates that the student will probably 
have little trouble in school, particularly in reading, and that he or she 



Test Bias 



seems to have the adequate, and appropriate fund of general knowledge 
expected of a middle-class American student, 

What^ theri^ does a low score indicate? This is a much more difficult 
question. It might indicate that the child cannot read adequately. It 
might also indicate that his or her store of prior knowledge in the areas 
tapped by the test is not adequate for the task. Or the student might 
have, as Thorndike (1917) claims, generally meager processing skills i the 
question of what to do about the student's problem then arises. Without 
being certain of the cause, it is very difficult to decide on a course of 
remedial actions 

The study suggests some potential antidotes to the problem. First, if 
comprehension is defined as the forming of a coherent cognitive model of 
the text meaning, then interest is most likely to be on the reader storing 
the central aspects of the text. It seems that the best way to evaluate 
this is to ask central questions ^ and possibly to prevent the reader from 
referring to the text while answering the questions. Note that asking 
central questions implies that the text should be long enough and 
structured enough to have a central thread. 

There may also be arguments for other question types which might 
supply diagnostic information. For example, if the definition of reading 
comprehension includes the use of prior knowledge in constructing the model 
of meaning, or the integration of the model of meaning with prior 
knowledge, then it might be useful to ask scriptally implicit questions 
also, since they require the reader to use prior knowledge. However, in 
asking such questions it must be recognized that they describe something 




Test Bias 
34 

different about the reader's comprehension from that which textual 
questions describe • 

By looking at performance on textual and scriptal items in the context 
r)f a prior knowledge score, it might be possible to diagnose schema 
selection problems. The prior knowledge measure by itself enables 
diagnosis of schema availability problems, i.e., lack of prior knowledge 
preventing adequate processing of the text. However, the diagnostic 
aspects of question type have only been scratched by this study. Much more 
work is needed to develop these question types into systematic and 
meaningf ul diagnostic instruments . 

The present study demonstrates that prior knowledge is a powerful 
source of test bias. It has been shown ( Johns tbhi 1981) that the extent of 
an individual's prior knowledge influences the basic cognitive processes 
which are involved in reading comprehension. It has also been argued amply 
and demonstrated elsewhere that prior knowledge influences the inferences 
which people make as they comprehend text (e.g., Anderson, Reynolds, 
Schallert, .St Goetz, 1976; Spiro, 1975). 

The important things to note are that (a) these systematic inf luences 
are described at the individual level, not at the popiiation level, and (b) 
prior knowledge is an integral part of reading comprehension. The 
consequence of these two facts is that since no two individuals will have 
id ent ical Dr ior knowledge j the cons true t ion of tes ts which are f ree of bias 
at the individual level is impossible. Furthermore, it can be argued that 
it would be undesirable in any case since a reading comprehension test 
uninfluenced by prior knowledge would certainly not be measuring reading 
comprehension as it is understood theoretically . 

37 

o 

ERIC 



Test Bias 
35 

At the level of standardized achievement tests, a major advantage of 
an approach which involves measuring prior knowledge has been demonstrated 
in the present paper. Bias can be effectively removed from tests by 
partialing out the effects of prior knowledge. The valuable aspect of the 
bias removal i: that it is not a widely recognized bias. Indeed, it shows 
that there are probably many biases, since bias arises at the individual 
level, not at the group level. The proposed approach allows us to avoid 
the dilemma of which group biases to attempt to remove. 

The proposed method of bias removal has a further advantage. Since 
reading comprehension involves not only being able to locate specific 
information on a page, but forming a coherent integrated representation of 
the information, more substantial text segments are called for. The 
introduction of prior knowledge measures would allow this luxury since it 
would no longer be necessary to increase the number and variety of texts to 
reduce bias.. Few would deny the greater validity of comprehension 
estimates based on more substantial segments of text. Apart from the 
greater flexibility which they allow in terras of question generation, 
longer texts allow more structure to be built into them and they have 
greater ecological validity. 

Furthermore, since forming a coherent representation is almost 
unnecessary when the text is available to return to while answering the 
questions, perhaps at least some parts of tests should not allow such 
access. This may provide a better assessment of understanding of 
the central aspects of a text since variability is associated mor:e with 
central chan peripheral questions in the no access situation. Knowing that 



38 



Test Bias 
36 

the two types of task recuire different skills, particularly given 
differing prior knowledge, it may be possible to form a better judgment as 
to the cause of a child's reading problem. Note, too, that these 
advantages hold whether the test is for diagnostic or for survey purposes. 
While the approach does provide information which is diagnostic, when used 
for bias removal, this information can increase the construct validity of 
the test score, since it provides an estimate of reading ability which is 
less contaminated than current test scores by differences in prior 
knowledge and general ability. 

Such advantages are not restricted to the standardized test arena. 
The classroom teacher, and other informal assessors (reading specialists, 
etc;), can also accomplish the same task with a few well chosen questions. 
Indeed, most teachers already ask relevant prior knowledge questions as a 
prelude to reading, largely as a "schema activation" procedure, to help the 
students bring their knowledge to bear on the text. These same questions 
can serve the dual function of alerting the teacher to the nature and 
extent of the children's relevant knowledge, thus providing an insight into 
the nature of the task demands upon the students. 

It is important that educators begin to look at comprehension skill in 
the context of the students' relevant prior knowledge, a suggestion made 
feasible by the finding that a brief content-relevant vocabulary subtest 
can provide a reasonably good indicator of prior knowledge. This use may 
be most obvious in assessments of reading in the content area. Unless the 
prior knowledge measure is available, little can be said about a student's 
ability to read content area text. Failure may be due to inadequate prior 
knowledge, inadequate strategies, or both. Sternberg (1981) claims that in 



EKLC 



33 



Test Bias 



37 



mentai testing the diagnostic goal is to be able to decide whether various 
processing components are unavailable, inaccessible, or inefficiently 
executed, and whether the components and strategies operate on an 
inadequate mental representations He suggests that perhaps "cognitive 
contents" tests are needed as well as cognitive components tests so that 
both knowledge and processing deficiencies can be assessed. Clearly there 
is room for improvement on the test questions developed in this study, but 
by the systematic examination of various domains, r:he ability to construct 
such tests should improve considerably. 

The data on question types also suggested the possibility of a 
reliability-validity tradefoff in current assessment procedures. Wh2n the 
text is available for reference while answering questions, the item type 
which distinguishes best between high and low knowledge readers is the 
peripheral itemi Consequently, if items are sslected on the basis of the 
discrimination index, we will end up with tests which tend to be composed 
of relatively trivial items just as Tuinman (1979) suggests • Indeed, 
Johnston and Afflerbach (Note 1) have provided evidence that such is the 
case. Is this what we wish to measure? Is it really what we consider to 
be comprehension? We must begin to look carefully at our priorities on 
these issues. A deeper understanding of exactly what we are getting from 
our current measures and of the alternatives should help in this matter. 

In conclusion, this paper was motivated by disenchantment with the 
assessment approach of controlling "nuisance" variables , particularly prior 
knowledge, by randomization. The approach cannot work. In particular, 
bias cannot be eliminated by collapsing across various content domains and 




Test Bias 
38 

throwing out items which violate an expected distribution. A more 
productive approach is to measure such "nuisance" variables and take them 
into account, as the vaiaabie information which they are, for our 
assessment interpretation. 



41 



Test E as 
39 



Reference Note 

Jdhristdri, P., & Afflerbach^ P, The ceritrallty of reading comprehension 
t^s^ q4i€&tlons and their validity * Paper presented at the annual 
meeting of the New York State Reading Association, Klamesha Lake, N.Y.j 
November 1982. 



Test Bias 
40 

References 

Anderson, R. C., & Freebody^ P. Vocabulary knowledge (Techi Repi No; 136). 
Urbaria: University of Illinois, Center for the Study of Reading, 
August 1979. (ERIC Document Reproduction Service No. ED 177 480) 
Anderson, R. C, Reynolds, R. E., Schallert^ D. L.^ S Goetz^ E. T. 

Fra mewor ks inv- ^omprehen^ott 4^i^coarse (Tech. Rep. No. 12). Urbaria: 
University of Illinois, Center for the Study of Reading^ July 1976. 
(ERIC Document Reproduction Service No. ED 134 935) 
Bartlett, F. C. Remembering . Cambridge, Mass.: University Press, 1932. 
Bransford, J. D., & Johnson, Mi Ki Contextual prerequisites for 

understanding: Some investigations of comprehension and recall. 
Journal of Verbal Learning and Verbal Behavior , 1972, 1^, 717-726. 
Cohen^ J.^ & Cohen^ P. Applied multiple regression/correlation analysis 

for the behavioral sciences . Hillsdale, N.J.: Erlbaum, 1975. 
Coleman, E. B. Developing a technology of written instruction: Some 

determinants of the complexity of prose. In E. Rothkopf & P. Johnson 
(Eds . ) , ^Verbal IgarajUig^ r esearch and the technoldg y of written 
instruction . New York: Columbia University Teachers College Press ^ 
1971i 

Davis, F. B. Fundamental factors of comprehension in reading. 

Psychometrica , 1944, 2> 185-197i 
Davis i F. B. Research in comprehension in reading. ReadLog Research 

Quarterly, 1968, 3, 499-545. 



Test Bias 
4± 

Erlebacher^ A. Design and analysis of experiments contrasting the within- 

and between-sub^ects manipulation of the independent variable. 

Psychological Bulletin , 1977, 84^ 212-219. 
Freebddy, P. Effects of vocabulary dif f icul ty and text characteristics on 

^Hi^ldren's reading comprehension . Unpublished doctoral dissertation. 

University of Illinois, September 1980. 
Hagerup-Neilsen, A. R. Tha^ role of- macros t rue tures and linguistic 

connect tv^es in comprehending f atniliar ^nd uitf^iliar written 

dia£OJar_se. Unpublished doctoral dissertation. University of 

Minnesota, 1977 i 

Hanna, G. S., & Oaster, T. R. Toward a unified theory of context 

dependence. Reading Research Quarterly , 1978-79, L4, 226-243. 
Johnston^ P. Reading comprehension assessment; A cognitive basis . 

Iriterhatidnal Reading Assbciatibn^ Newark^ Del.^ 1983. 
Johnston, P. Prior knowledge and reading comprehension test bias . 

Unpublished doctoral dissertation, University of Illinois^ August 

1981. 

Kintsch, W., & van Dijk, Ti A. Toward a model of text comprehension and 
production. BsychologicaL Review , 1978, Bd, 363-394. 

Linn, R. t., tevine, M. Wi, Hastings, 6. N., & Wardrop, J. t. Att 

investigation of item bias in a^ test of reading romprehenston (Techi 
Rep. No. 163). Urbana: University of liiinois. Center for the Study 
of Reading, March 1980. (ERIC Document Reproduction Service No. ED 
184 091) 



Test Bias 
42 

Lucas, Pi A., & Mceonkie, G. W. The defiriitidri of test items: A 

descriptive approach. Ainerican Education^ Research JouCTal , 1980, 
22» 133-140. 

Nicholson, T., Pearson, P. D., & Dykstra, Ri Effects of embedded anomalies 

and oral reading errors on children's understanding of stories. 

Journal of Reading Behavio r, 1979, H_, 339-354. 
Oraansoh^ R. C. The relation between centrality and story category 

variation. Journal of Verbal Learning and Verbal Behavior , 1982, 21, 

326-337. 

Pearson, P. D., Hansen, J., & Gordon^ C. The effects of backg-ound 

knowled j^ ££. youtt g^ children's comprehension of explicit and implicit 
IntormatiQa (Tech. Rep. No. 116). Urbana: University of Illinois, 
Center for the Study of Reading, March 1979. (ERIC Document 
Reproduction Service No. ED 169 521) 

Pearson, P. D., & Johnson, Di D. Tearhinyr r^adtng^ comprehension . New 
York: Holt, Rinehart & Winston, 1978. 

Raphael^ T. E. The effect of metacdgnitive awareness training on students. 
ques tion answering behavior . Unpublished doctoral dissertation. 
University of Illinois^ 1981. 

Reynolds, R. E., Taylor, M. A., Steffenseri^ M. S., Shirey^ L. L., & 

Anderson, R. C. XUjrl&ural schemata and reading comprehension (Tech. 
Rep. No. 116). Urbana: University of Illinois^ Center for the Study 
of Reading, March 1979. (ERIC Document Reproduction Service No. ED 
169 521) 



EKLC 



45 



Test Bias 
43 

Raphael , T . E • ^Ihe effec t^ at raetacbgriltlve awareness training on students' 
qji e s tton answering ^ehavior^. Unpublished doctoral dissertation, 
University of Illinois, 1981, 

Reynolds, R. E., Taylor, M. A., Steffenseti, M. S., Sbirey^ L. L. ^ & 

Anderson, R. C. -Gottor^ schemata ^ad raading comprehension (Tech, 
Rep, No, 201), Urbana: University of Illinois, Center for the Study 
of Reading, April 1981 i 

Schank, R, C, The structure of episodes in memory, in D. g. Bobrow & A, 
Collins (Eds,), Representation and understandia£^ ^^tadies in 
cognitive science . New York: Academic Press, 1975. 

Spearitt, D. Identification of subskills of reading comprehension by 

maximum likelihood factor analysis. Reading Research Quarterly , 1972, 
1, 92-iii. 

Spiro, R. Ji In ferentia l recoas-truct^loft Xa memory for connected discourse 
(Tech. Rep. No. 2). Urbana: University of Illinois, Center for the 
Study of Reading, October 1975. (ERIC Document Reproduction Service 
No. ED 136 187) 

Spiro, R. J. Schema theory and reading comprehensiioax New dire ctio ns 

(Tech. Rep. No. 191). Urbana: University of iiiinois, Center for the 

Study of Reading^ December 1980. 
Sternberg, R. J. Testing and cognitive psychology. American ^gyc hologist ^ 

1981, 36, 1181-1189. 
Thorndike, E. L. Reading as reasoning: A study of mistakes in paragraph 

reading. Journal of Educational Psychology , 1917, 8, 323-332. 



46 



Test Bias 
44 

Tuinman, J. Ji Determining the passage-dependency of comprehension 

questions in 5 major tests i Reading Research Quarterly , 1974, 2^^ 207- 
223. 

Tuinman, J. J. Reading is recognition—When reading is not reasoning. In 
J. C. Harste & R. R. Carey (Eds.), New perspectives on comprehensian 
(Monograph in Language and Reading Studies No. 3). Bloomington: 
Indiana University, 1979. Pp. 38-48. 



Test Bia 
45 



Table 1 



Part i t i on i ng 


of Reading Comprehension 


Variance 


and 


Tests of Significance 




Variable 




increment in 
Percentage of 
Variance Explained 


Between 


IQ 




11.91 


Text Ava i 1 ab i 1 i ty 


7.67--'"-'- 


i».68 


Question Delay 


<1 


.20 


\Q. X Text Availability 


<i 


.05 


IQ X Question Delay 


<i 


.16 


Wi th in 


Passage Contrast I 


20. 62-"- 


6.09 


Passage Contrast 2 




7. Oh 


Prior Knowledge 




3.46 


l(J X Prior Knowledge 


<1 


.01 


Text Availability x Prior 
Know 1 edge 


<i 


.01 


Question Delay x Prior 
Kn owl edge 


<\ 


.02 



Note . Ail independent variables have one degree of freedom. 
Ry £ = .^30. 

Between subject d_f error = 1 36 , £ = .170. 
Within subject d£ error = 282, ^ = .I67. 



£ < .01 
£_ < .001 



48 



Test Bias 
46 

Table 2 

Partitioning of Between Subject Reading Comprehension 
Variance Showing the Proportion of Variance 
Associated with Prior Knowledge 



Increment in 

Variable £ Percentage of 

Variance Explai ned 

Corn 

IQ 15.45:-:-^^:- 8.78 

Prior Knowledge 21 . 56-^-^-^-^ 12.25 

TOTAL 21.03 

City 

IQ 2.88 1.92 

Prior Knowledge 7-88-'"^ 5-26 

TOTAL 7-18 

Civil War 

IQ 21.78^''^^^^ 11.26 

Prior Knowledge 32. 68''^''"'^''^ 16.89 

TOTAL 28.15 



Note , df-e = 139. 
< .01 . 
^^^^^2^ < .005. 

^:^^V:Vp < .001. 



49 



Test Bias 
47 



Table 3 



Parcitibriing of Becwees Subject Reading Comprehension Variance. With 
Significance Tests for Each Passage. Prior Knowledge and Population 
Group (Rural/Urban) Are Entered into the Regression in Both Orders 
to Show Population Bias and Its Removal 



Order of Entry of Increment in. 
Variable Independent Variables S .^^^^^^^Se of 
into the Regressiisn Variance Explained 

Com^ 



Rural/Urban 


1 


6.26* 


3.69 




Prior Knowledge 


2 


24. 49**** 


14.43 




Prior Knowledge 


1 


29.57**** 


17.42 




Rural /Urban 


2 


1.18 


.69 















Rural /Urban 
Prior Knowledge 


1 
2 






12.30*** 
<1 


8.03 
1.26 


Prior iCnovledge 
Rural /Urban 


1 

2 






10.38** 
3.89 


6.75 
2.54 








Civil 


War^ 




Rural /Urban 
Prior Knowledge 


1 
2 






5.22* 
37.53**** 


2.87 
23.52 


Prior Knowledge 
Rural/Urban 


1 

2 






41.78**** 
<1 


22.99 
.53 








Civil 


War'^ 




Male/Female 
Prior Knowledge 


1 
2 






3.71 
40.89**** 


2.02 
22.27 


Prior Knowledge 
Male /Female 


1 
2 






42.21**** 
2.38 


22.99 
1.30 


- .181, df error « 


139, 


X « 


6.94, 


SD « 2.73. 




^R^ « .093, if error - 


139, 


X - 


5.52, 


SD » 2.39. 




^R^ - .235, df error - 
V = .243, df error - 


139, 
139. 


X " 


7.39, 


SD - 3.38. 





*£ < .05. 



^ -Ol- 
***2 < .005. 

****£ < .001. 



50 



Test Bia 
48 

fable 4 

Summary of Regression Analyses Demohst rat I hg the Removal of Bias by the 
Randomization Method (summing raW scores across content areas) and the 
Prior Knowledge Method (partial ihg but the Influence of prior knowledge 
before surnrni ng across Content areas ) 







Randomization Method 








Var j ance 
Due to Predictor 


Total 
Variance 




24. 12 


5.18 


35.'»0 


Populat ion 


2.89 


.62 




Prior Knowledge Method'^ 


IQ 


5.92''-- 


.25 


6.15 


Popul at icn 


2. 19 


.09 





Note . df error = I38. 

All independent variables have one degree of freedom. 



Dependent variable = sum of three content scores. 

'^Dependent variahje = sum of three residual content scores after 
the effects of prior knowledge have been removed from each. 

^^p < .05. 
^'^p < .001 . 



51 



Test Bias 
49 



table 5 

Mean^ Correlations Between Increasingly General 
Vocabulary Tests and liQ. and Reading Comprehension 



w , 1 -r ^ Read ing 

Vocabulary Test Comprehension I.0.. 

Content re 1 evant vocabu 1 ary questions .39 -25 

( 1 1 questions) 



b 



Vocabulary questions not relevant to .33 (-22) -32 (-30) 

the passage content 
(22 questions) 

Al 1 vocabulary questions -35 (-31) -37 (-32)^ 

(33 questions) 



^mean of the 3 correlations between vocabulary and reading comprehension 
scores by content area. 

^mean correlation with 11 item vocabulary test in which the 11 items were 
a random selection of half of the 22 item test in order to equate 
reliability with the content relevant test. 

^mean correlation with 11 item vocabulary test i n wh i ch _ the 11 items were 
a random selection of one third of the total 33 items in order to equate 



reliability with the content relevant test- 



52 



Test Bias 
50^ 

Table 6 

Vocabulary Tests as Measures of General Verbal 
Ability and as Measures of Prior Knowledge 



_ _ I ncrement i n 

Variable _F Percentage of 

Va r i arice Exp 1 a i ned 

Corn 

iQ 16. 02^'^^^^^ 8.78 

General vocabulary 1 3^ 54:VV:V:V: 7^42 

Prior knowledge 1 5. 87-^^'^^'^" 8.70 
t ■ 

Ci ty 

IQ 2.83 1.92 

General vocabulary <1 .04 

Prior knowledge 7-75'"-* 5-25 

e i V 1 1 Wa r 

21.66^^'^^'^^^ n.26 

General vocabulary 8.43-- 4.38' 

Prior knowledge 25.32'"—- 13- I6 



Note . General verbal ability test = score on 22 cdmplementary 
vocabulary items. 

Prior knowledge test = score on 11 content specific vocabulary 
i tems . 

Mean percentage of variance accounted for by general vocabulary 
= 3.9'^. 

Mean percentage of variance accounted for by prior knowledge = 

All independent variables have one degree of freedom. 
^'r:: p^ < .01. 
^:^^p^ < .005. 

;'c:V^c>Vp_ < . 001 . 



53 



Test B±as 
Si 





Table 


i 7 




Partitioning of Variance 


of Reading 


Gomprehens i on 




Question Type Subscores 




Vari able 






IhCrelTient fn 
Percentage of 
Variance Explained 


Within 


Passage Contrast 


1 


o ri - *> O «•» 


i i A 


Passage Contrast 


2 


33-95 


1 9 7 


Prior knowledge 








Scriptal vs. Textual questions ((Jl) 


105.92"""" 


3. 


Text exp licit vs. 


Text implicit (Q2) 




1 1 Q 

1 . 1 y 


Central i ty 




< 1 


. 


Prior knowledge x 


Ql 


2.55 


. 10 


Prior knowledge x 


Q2 


<1 


.62 


Prior knowledge x 


Central i ty 


7.33"" 


.27 


Ql X Central i ty 




i4.59-'^ 


.17 


QZ X Central i ty 
Prior knowledge x 


Central i ty x Ql 


3.68 
<1 


.14 
.03 


Prior knowledge x 


Central i ty x Q2 


<i 




Prior knowledge x 
avai labi 1 i ty 


Central i ty x Text 


<i 


.01 


Prior knowledge x 
Quest ion del ay 


Cent ra 1 1 ty x 


<i 




X Text availability 


16. lO-'^-'^''^''- 


.60 


Ql X Question delay 


<j 


;ei 


Q2 X Text availability 


1 .10 


.ok 


Q2 X Question delay 


2.23 


.09 


Central i ty x Text 


avai labi 1 i ty 


7.62'V^ 


;28 


Central Ity x Question delay 


2.63 


.10 



^4 



Test Bias 



52 

Table 7 (continaed) 



Variable 






j ncrement i n _ 
Percentage of 
Variance Explai ned 
— 


Prior knowledge x 


Cent ral i ty x 




.37 


Text. avai labi 1 i ty 






Prior knowledge x 


Central i ty x 




.03 


Question delay 








Prior knov^?ledge x 


Ql X Text 


<1 


.02 


avai labi 1 i ty 








Prior knowledge x 


Ql x Question 


2.95 


;11 


del ay 








Prior knov^/jedge x 


Q2 X Text 


<1 


.02 


avai 1 abi 1 i ty 








Prior knowledge x 


Q2 X Question 


2.09 


.08 


del ay 






•31 


tjl X Central i ty x 


Text aval labi 1 i ty 


8.32^^ 


Ql X Central 1 ty x 


Question delay 


<1 


.03 


Q2 X Central i ty x 


Text ava i 1 ab i 1 i ty 


<1 




X Central i ty x 


Question delay 


<1 


:bi 



Note. All independent variables have one degree of 



Ry s_ = .0365. 

Between subjects R.^ =. 082, d_f error = 156. 
Within subjects = ^1652, df error = 2,388: 



/ 



.46 
.44 
.42 
AO 
.38 
^ .36 

5 

S .34 
q: 
o 

o .32 
2 .30 
p .28 

O 

^ .26 
.24 
.22 
.20 tr 



LOW PRjOR^ 
KNOWLEDGE 

HIGH PRIOR 
KNOWLEDGE 




CENTRAL 

PERIPHERAL 
CENTRAL 



PERIPHERAL 



I 



AVAILABLE 



TEXT 
UNAVAILABLE 



AVAILABILITY OF TEXT WHILE ANSV/ERING QUESTIONS 

Figure 1. The three-way interaction between reader prior 
knowledge, question centrality and long-term 
memory demands of the task on proportion of 
questions correct. 



56 



.55 



o 
u 
q: 
ee 

G 

O 

e 
I- 

QC 
O 
Q. 
O 

a. 



.50 



.45 



.40 



.35 



.30 



.25 




SGRIPTAL 
QUESTIONS 



TEXTUAL 
QUESTIONS 



CENTRAL 



cenTrality 

OF 
QUESTION 



PERiF-HERAL 
PERIPHERAL 

CENTRAL 



I 



TEXT 
AVAILABLE 



TEXT 
UNAVAILABLE 



AVAILABILITY OF TEXT WHILE ANSWERING QUESTIONS 

Figure 2. The effects of the interaction between question type, 
centrality of the question, and the long-term memory 
demands of the task^ on proportion of questions correct 



